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I. INTRODUCTION 



The fundamental dynamical processes of evolution are 
connected with dynamical processes based on sequences. 
This statement is supported by the basic Darwinian 
*^\necb an ism of molecular evolution. The genetic message - 
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the genotype - is coded by a DNA-sequence and the whole 
cell dynamics is determined by an interplay of proteins 
and polynucleotides, 
jlj The phenotype are all properties characterizing a 
^epecies and the assignment of genotypes and phenotypes 
^^should be called the genotype-phenotype map. The pro- 
^^cess of Darwinian selection is based on the fitness of phe- 
1- ^notypes. This valuation process may be schematically 
represented by 
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genotype 



phenotype > fitness 



9&n fact all forms of life are determined by games based on 
(^sequences of amino acids which are valuated through the 
^^corresponding phenotype (Gatlin, 1972; Ratner, 1983; 
""^-jVolkenstein, 1975). 

So far there is no complete model for any concrete bi- 
Oological system. This is due to the enormous difficulty of 
fLi he biological valuation process. Special models for the 
^evolution of prebiological systems were investigated e.g. 
*^by Eigen, Schuster (Eigen et.al., 1977, 1978; Eigen et.al., 
C^1989), Anderson and Stein (Anderson, 1983; Anderson 
et.al., 1983). 

This work is devoted to the investigation of games 
based on sequences. These games are characterized by 
a evolutionary dynamics based on certain artificial valu- 
ation process. As prototype we consider the frustrated 
game proposed by Engel (Ebeling, Engel, Feistel, 1990). 

The idea of this paper is that the genotype-phenotype 
map transforms the rugged valuation landscape of fitness 
values of genotypes to a smooth fitness function of pheno- 
types and facilitate in this way the process of evolution- 
ary search. On the other hand this may be interpreted as 
a mapping of an optimization problem to an intermediate 
level of coding, which is reflected in the representation 
problem of evolution" ary strategies, genetic algorithms 



and genetic programming (Rechenberg, 1973; Goldberg, 
1989; Koza, 1992). 

The fitness landscape proposed by Engel shows a 
rugged structure which is related to frustration of the 
problem (Ebeling et. al., 1994). We use the results of first 
three sections to characterize the valuation landscape. 
The main result is the determination of the density of 
states n(F) by simulation and theoretical investigation. 

Dynamical processes based on strings may be of some 
interest also for other fields of scientific activity: As we 
well know, the dynamics of information processing in hu- 
man systems is based on the storage and exchange of the 
messages coded by strings of letters. Further we mention 
that many optimization processes, as e.g. the search of 
the travelling salesman, may be mapped on games with 
linear strings of letters. Finally we would like to point 
out that by the method of symbolic dynamics any tra- 
jectory of a dynamic system may be mapped on a string 
of letters on certain alphabet (Ebeling et.al., 1991). 



II. GENERAL PROPERTIES OF A FITNESS 
LANDSCAPE ON SEQUENCES 



In the following we consider a set of N sequences of 
length L, forming certain region in the sequence space. 
We assume that the elements of the sequence are taken 
from an alphabet consisting of A types of letters. The 
complete set of different sequences of length L consists 
of Nl = ^ L elements. For L ~ 100 this number is as- 
tronomical. The most of possible sequences may be for- 
bidden in realistic systems leaving only a subset of N 
admitted ones for participation in the game. 

Let us consider a mutation that takes place in a se- 
quence. To measure the heaviness of the change we need 
a metric (distance) on the sequence space. The geome- 
try of the space determines possible metric measures, we 
have to choose one of them. A standard metric may be 
introduced by means of the Hamming prescription. The 
Hamming distance between two sequences is defined as 
the number of non-coincidences. 



1 



We may define the neighbourhood structure of the se- 
quence space due to this metric: Two sequences s, s' 
with a hamming distance h(s, s') = 1 are neighbours. 
The neighbourhood structure is given by the adjacent 
matrix with A(s, s') = 1 for neighbours and otherwise 
A(s, s') = 0. A geometrical visualization of the neigh- 
bourhood structure may be given by a graph with edges 
connecting the neighbouring sequences. For L = 1, 2 with 
the alphabet {A, B, C, D} this looks like 
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FIG. 1. Neighbourhood structures of the sequence space 
L = 1, L = 2 due to the Hamming metric. 

In the case L = 2 we draw only one of the four identical 
components of the adjacent graph, all four components 
are strongly connected with each other. For L ~ 100 
the space has an extremely high number of points. The 
neighbourhood structure of the sequence space shows the 
feature, that all its points are near one to another, since 
the maximal number of non-coincidences is L and the 
adjacent graph is very strongly connected. 

Without loss of generality we consider the sequences as 
the genotypes of the individuals. All possible sequences 
may be considered as elements of a metric space, the 
genotype space G. 

An individual comes into being by expression of a geno- 
type s G G. The individual has properties building its 
phenotype <f>, in this way we introduce also a phenotype 
space Q. The genotype-phenotype map $:s£(?^(jie 
Q assigns the genotypes s to phenotypes <f>. Each phe- 
notype will be valuated in the selection process which 
determines the fitness function F(<f>). When this hap- 
pened we can say: The genotype s was valuated with the 
fitness value V(s) = F(<f>), 4> = $(s). The surface formed 
by the fitness values on the sequence space is called the 
valuation landscape (Conrad, 1983; Conrad et.al., 1992). 
Strictly speaking the landscape consists only of a discrete 
set of points, for imagination we connect these points by 
a surface, e.g. by a piecewise linear approximation. 

The genotype-phenotype map $ is the representation 
of genotypes in the valuation process which has built the 
valuation landscape V by genotype expression of pheno- 
types and selection with F. The fitness value V(s) may 
be an element of a real vector space. For simplification 
we confine ourself to the case that V(s) G is just a 
positive real number. Thus the valuation process is given 

by 

G M+ 

*\ /f (!) 

Q 



Let us introduce now the so-called density of states, a 
term borrowed from solid-state physics, as a first measure 
of the structure of the fitness landscape. We assume that 
the value is bounded from above and from below in the 
set of sequences V m i n < V% < V max . We define the total 
number of sequences having the value V% < V by N(V) 
and the relative occurrence by S(V) = N(V)/Nl where 
Nl is the total number of admitted sequences of length 
L. Here N(V) and S(V) are step functions converging 
to the values Nl or 1 respectively. 

We expect that the sequences having values in the in- 
terval [V, V + dV] form a kind of density which we call 
the density of states n(V). The density of states which 
formally is the derivative of N(V) consists of delta-peaks. 
Correspondingly a normalized density of states cr(V) may 
be derived from S(V). Later on we shall use these con- 
cepts for a structural characterization of the landscape. 
We mention that the integral and differential number 
densities are invariant with respect to any ordering or 
choice of the neighbourhood structure on the sequence 
space. 



III. THE SMOOTHING REPRESENTATION AND 
THE GENOTYPE-PHENOTYPE MAPPING 

At the beginning of a general investigation of evolution 
we have to ask the question: Why does evolution work on 
sequence spaces? In fact we now, evolution finds the ex- 
trema of a smooth fitness landscape very well. There 
exists a gradient way to extrema. The evolutionary dy- 
namics is able to follow this way without sticking in local 
extrema. On rugged landscapes it is very difficult for 
evolution (and all other optimization strategies) to find 
a way to extrema (Kaufman, 1989, 1990, 1993). Conse- 
quently evolution had to establish a smoothing represen- 
tation of the valuation landscape on sequence spaces for 
successful search. 

On the other hand, there is the evidence that evolution 
does not valuate genotypes directly: The fitness function 
F((f)) values the phenotypes in the selection process and 
then the fitness values of genotypes V(s) = F(<b(s)) are 
only given by means of the genotype-phenotype map. 

It is highly probable that the smoothing representa- 
tion and the genotype-phenotype map are two interpreta- 
tions of the same fact: The evolution had to choose the 
genotype-phenotype map in such manner that the rep- 
resentation of genotypes leads to a smooth fitness func- 
tion. This is a necessary condition for efficient search in 
sequence spaces. 

Indeed, it is very well known that the representation 
problem in evolution" ary strategies, genetic algorithms 
and genetic programming (Rechenberg, 1973; Goldberg, 
1989; Koza, 1992) is the crucial point for success. Find- 
ing a representation is a complicated problem - there ex- 
ists no algorithm to choose a good representation - it 
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is only solvable with human inspiration. When a good 
representation was found evolutionary algorithms works 
very well. Exactly in this sense, evolution had to find a 
smoothing representation for genotypes and already the 
fact that we are able to think about it shows: it was 
found. 

First let us define the meaning of a smooth represen- 
tation of the fitness function on discrete spaces. We de- 
mand that the function is e-contmuous in the following 

sense: 

Definition 1 Let F : Q — ► LR+ be a function on a dis- 
crete space Q with neighbourhood structure A. We call 
F e-contmuous with the degree max ^( s ) ; 

e = {\V(s)-V(s')\) s ^ GA(sy)=1 . (2) 

A function with a higher degree is more continuous. 

The phenotype should be some thing like the proper- 
ties used by common sense to characterize the fitness of 
species (e.g. strength, robustness, speediness) and di- 
rectly determine the fitness of an individual. A small 
alteration in the phenotype cause a small change in fit- 
ness. That is the principle of strong causality (Rechen- 
berg, 1973). Thus we give following definition 

Definition 2 The phenotype variables <f>^ are param- 
eters which are able to represent the fitness function 
F : Q — ► LR+ as a bijective function with a suffi- 
cient high degree of e- continuity on the phenotype space 
Q = {^ = (^,. .,<£")}. 

The e-continuity of F guarantees that a small change A<f> 
corresponds to a small fitness difference AF(<f>). The bi- 
jectivity of F provides that the fitness of an individual is 
unique determined by its phenotype and vice versa. 

Now we asking for the question: It is possible to con- 
struct for a given problem a genotype-phenotype map 
<f> = $(s) in such a way, that for all possible genotype 
states s and phenotype states <f> the fitness function F(<f>) 
can be given by definition 2? In general, can such a map 
$ exists? 

Every genotype state s has the fitness value V(s) = 
F(<b(s)), but it is possible that many s £ G have the 
same value V . On the other hand for each fitness value 
F((f)) there exists a unique phenotype state <f>. Thus we 
formulate the following theorem 

Theorem 1 Two genotypes s, s' £ G are equivalent by 
the relation 

8 ~ s' o V(s) = V(s'). (3) 

The phenotype states <f> are the equivalence classes of the 
genotypes with respect to the valuation V(s) of genotype 
states s and fitness value F . 

<t>(F) = {[s]:[ 8 ]eG/~,V( 8 ) = F}. (4) 



There exists a unique ordering procedure changing the 
neighbourhood structure A of G / ~ in such a manner 
that the fitness function F has a higher or equal degree 
as the valuation landscape V with respect to e-continuity. 
From this, the genotype-phenotype map 

$ : G -+ Q (5) 

can be uniquely determined by the ordering procedure. 
For the proof see appendix A. 

Let us explain the idea of the proof. The sequence 
space G has the structure of a graph represented by the 
adjacent matrix A. In general the building of equivalence 
classes leads to a change in the topology. The following 
picture illustrate this fact in this special case. 




FIG. 2. Neighbourhood structures of G and Gj ~. 



The degree of continuity of V does not change by this 
process. If we consider point mutations of sequences then 
these mutations form a group Q generated by finite ele- 
ments (/i,..., g v (generators) . There is a finite number of 
point mutations transforming a sequence into an other 
one which has the same fitness value. If we represent the 
sequence by a point and every group generator by a line 
then the process of building equivalence classes can be 
described by the graph 
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G G/~ 

FIG. 3. Action of the mutation on G and on Gj ~ 



So that the ambiguity comes from loops generated by 
the building of equivalence classes and represented by the 
mutation group Q. With two little rules (see appendix 
A) these loops can be eliminated to get a valuation with 
higher degree. By definition this is the property of the 
fitness function and we can identify the phenotype space 
with the space of equivalence classes together with the 
rules to smooth the landscapes. This defines by a unique 
procedure the genotype-phenotype map $. Fig. 4 is a 
good example for this process. 
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FIG. 4. Genotype-phenotype map 

Obviously, the best genotype-phenotype map is the 
RANK-operator defined by ordering the fitness values to 
maximize the degree of the fitness function (fig. 5). The 
existence of the RANK-operator is an example of the or- 
dering procedure introduced in theorem 1. 

In section II we introduced the density of states n(F). 
Now we can explain the meaning of this measure: Geno- 
types with the same fitness value build an equivalence 
class - the corresponding phenotype. The number of 
genotypes of a certain equivalence class (phenotype) is 
the density of states (fig. 5). Obviously, the density of 
states is only related to the ordering of fitness values but 
there are no references to the geometry and topology of 
the fitness landscape. 

The density of states answers of the question: How dif- 
ficult is it to find a certain phenotype? The density of 
states will be very low on hight fitness levels. Problems 
with a very fast slowdown of n(F) will be very hard to 
solve. In this sense we may say: The density of states is 
a classifying measure of fitness landscapes. If we know 
n(F) of two problems we are able to decide which prob- 
lem is more difficult. 




{0} {1,7,9/(5,8} {4} {3} 16) {2} (J) 



1 > 

1 r 2 3 4 5 6 F 
FIG. 5. The RANK operator as Genotype-phenotype 

map and the density of states n(F). 



IV. TOY MODELS OF THE EVOLUTIONARY 
DYNAMICS 

The evolutionary dynamics take place on spaces de- 
scribed in the previous sections. The valuations process 
of the evolutionary dynamics is characterized by a intri- 
cate genotype-phenotype map and a fitness function on 



the phenotype space. For simplicity of the introduction 
we confine ourselves to the case that the genotype and 
phenotype space are identical (The genotype-phenotype 
map is the identity). 

We consider a genotype space of sequences s £ G 
and choose a fixed numbering of the genotypes (Godel 
number) 1, . . . , i, . . . , n. The most simple model of an 
evolutionary dynamics is the Fisher-Eigen model which 
is based on the assumption that the competing objects 
i = 1, . . . , n have different reproduction rates V{. These 
rates play now the role of the fitness. The evolutionary 
dynamics is given by the differential equations (Fisher, 
1930; Eigen, 1971) 

ii = (Vi - (V)) Xi , (V) = V i x i ,J2 Xi = l > ( 6 ) 

i i 

where the fraction of individuals with the genotype i 
in the population. The species with values better than 
the "social" average (V) will succeed in the competition 
and the others will fail. Finally only the species with the 
largest rate V max will survive. 

In this way the Fisher-Eigen game explores the fit- 
ness landscape, finding out the peaks. The Fisher-Eigen 
model is the simplest of all models of competition. It 
refers to an oversimplified case and one may say that 
the model reflects only pseudo-competition since there is 
no real interaction between the species. Evolution needs 
not only competition but also mutations. We introduce 
mutation by an additional term in the dynamic equation 

ii = k(V{- < V >)xi + '^2 l [A i jXj - AjiXi] (7) 

j 

The scalar parameter k was introduced to allow a change 
of the strength of competition k. One may consider three 
cases (Boseniuk et.al., 1987; Boseniuk et.al., 1990): 

(i) Model of Darwinian evolution: 

k = 1; 

where C'ij is a symmetrical matrix C'ij = Cji of 
mutations. The symmetry of the mutation rates 
models the isotropy of biological mutations. 

(ii) Model of Boltzmann evolution (Metropolis algo- 
rithm): 

k = 0, {i = 1, 

A--C-1 1 lf Vl>V 

Al] ~ \ exp[(V;- - Vj)/F] else 

Here the real positive parameter is the "tempera- 
ture" of the Boltzmann search. 

(iii) Mixed Boltzmann- Darwin strategies: 

In this case we make the choice < k < 1 and the 
Boltzmann-type mutation rate with T > 0. 
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We mention that the case (i) corresponds now to k = 1 
and T = 0. Further the case (ii) corresponds to k = 
and T > 0. 

The basic elements of games playing the evolutionary 
dynamics are competitive self-reproduction which occurs 
with the rate Vi and mutation which produces a geno- 
type i from j with the rate Aij . Selection is introduced 
by the condition of constant population size. 



V. A FITNESS FUNCTION WITH TWO 
FRUSTRATED PERIODICITIES 

In the general case the valuation of a bio-sequence may 
be extremely complicated since it is based not on the 
primary structure itself but on a valuation of the cor- 
responding phenotype. Here we restrict ourselves to an 
extremely simple model which was proposed by Engel in 
1989 (Ebeling et.al., 1990). We also mention the sim- 
ilarity to ID spin glasses with two different interaction 
ranges. We simplify the Engel-model by closing the se- 
quences to rings. In this model the valuation V(s) of the 
sequences s over the alphabet {A, B, C, D} and the 
length L is based on the following simple rules 

(i) If a letter is in alphabetical order ABCDA... with 
the following letter then the fitness will increase by 
one, i.e V = V + 1. 

(ii) If letters on position i and i + p are the same then 
V = V + b. 

If the i-th element of a sequence s is denoted by Si the 
valuation function (fitness value on the genotype space) 
is given by 



of fitness may be independent of the special letter of Si , 
e.g. A1/(AB BB) = AV(AB CB). Because of 
this fact we introduce a new description of the sequences 
which will allow us to find out the equivalence classes 
(phenotypes) of genotypes. 

At first we looking for all possible transitions of se- 
quences due to point mutations. A point mutation of the 
element Si of a sequence changes only the alphabetical or- 
ders a j = a(Si), a;_i and the p-periodicities 7T;, 7r 8 '_ p (be- 
cause of the direction in the arrangement of letters). The 
set of all possible transitions (a 8 '_i,a 8 ') t — ► (a;_i, a 8 ')t+i 
from time step t to t + 1 is given by 



(00) - 

(01) - 

(10) - 

(11) - 



e 

9i 

92 
94 

93 

9? 

9s' 

91 1 



(10) 



These transitions form a group Q a generated by 
{e, 91,92,93, 9a] with respect to the relation gig^ 1 93 = e 
and the group operation is the concatenation of genera- 
tors. That means Q a has the structure of a free group. 

For the p-periodicities (7r;_ p , tt;) one can easy found the 
same structure of transitions. Thus, the point mutation 
group Q w has the same structure like Q a , i.e. 



(11) 



both groups are isomorphic. The two states on and 7r 8 ' of 
the element S; define the scheme state 



V(s) = Y^HSi) + b7r(Si 



(9) 



where a(Si) = 1 if Si,Si+i in alphabetical order and 
ir(Si) = 1 if Si = Si+p otherwise a(Si) = ir(Si) = 0. 
The first rule (1) favours alphabetical sequences ABCD- 
ABCDABC...D with period 4. The second rule (2) 
favours periodic repetitions with the period p. If p ^ 4 
then the tendencies to generate strings with period 4 or 
p are contradictory, i.e. the system is frustrated. We 
choose p = 5 and set b = b c = l/L [L/p] (Ebeling et.al., 
1990), b <C b c favours alphabetical sequences and b^> b c 
p-periodic ones. 

The valuation landscape V has a rugged structure, i.e. 
sequences which are quite near with respect to their Ham- 
ming distance may have very different values. In the third 
section a general algorithms to smooth the fitness land- 
scape was introduced. Now we want to present a example 
of this procedure in the case of the evolutionary game. 
Because of the linear structure of the fitness function the 
building of equivalence classes leads also to a linear struc- 
ture. If we mutate an element Si of a sequence the change 



@ : alphabetical 
# : p — periodic 
$ : both 



(12) 



Together with the action of group elements on a scheme 
/ = /1 •••/;••• /l we obtain the following fitness change 
for every point mutation g £ Q a x Q w 



A/( ff )=/(/ t+ i )-/(/*), h 



+1 



9 St 



(13) 



where the index t denotes the time step. Thus, the possi- 
ble changes of fitness by one point mutation g = (g a , g w ) 
read as 



A/(ffa) 



e 
9i 

92 
93 
94 



(14) 
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with Af(g) = Af(g a ) + Af(g w ), Af(g^) = -Af(g). 

The description of all possible transition by means 
of schemes / leads to a characterization of equivalence 
classes of sequences with respect to the fitness levels. 

(i) Class of interchangeable letters: 

We choose a new encoding of the letters {A, B, C, 
D} — {@, $, *} which transforms the sequence 
s = Si...Sl into the scheme / = e.g. the se- 

quences BCDADAA and CDABABB belong to 
the same class of the scheme / = @@@*$#@ with 
V(J) = 5 + 2b. We can interchange the letters A 
-^B,B^C,C^D,D^A without changing 
the scheme and fitness value. 

(ii) Class of permutable schemes: 

The schemes @@@*$#@ and @@@@*$# are the 
same up to one translation. To characterize the 
classes of schemes which differ only by translation 
and permutation we encode the scheme @@@*$#@ 
— ► (1,1,4,1) by the scheme vector counting the 
numbers of ($, @, *) in the scheme. E.g. for 
b = 0.1 and L = 7 we can find 



V 


scheme 


$ 


# 
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6.0 


@@@@@*@ 
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1 




bcdabca 










5.2 


*$$@@*@ 


2 
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2 




babcdaa 












@@@*$#@ 
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bcdadaa 











(iii) Class of fitness levels: 

The scheme vectors (1,1,4,1) and (2,0,3,2) have 
the same fitness V = 5 + 2b. We build the equiva- 
lence classes of fitness levels by counting the num- 
bers of alphabetical and p-periodic letters (a, 7r), 
e.g. (1,1,4,1) (5, 2), (2, 0,3,2) -> (5,2). The 
state (a, it) determines unique the fitness value 
V(a, 7r) = a + b it. Thus we call the numbers (a, 7r) 
of alphabetical and p-periodic letters of a sequence 
s: the phenotype <f>. 

Thus, the genotype-phenotype map of the system is given 

by ' " ' ' 



$ : s £ G <t> 

L 

a = y^Qf(gj) 

8 = 1 



(a,w) £ Q, 

L 

7T =J2*(Si) 



(15) 



We emphasize not all combinations of a, tt are possible. 
The structure of $ is just determined by this restrictions. 

Now, we are able to show the smoothing action of the 
genotype-phenotype map. We consider sequences of the 
length L = 7 and b = 0.1 and choose a Hamilton way 
through the genotype space G. That means, we give ev- 
ery sequence Si 6 G a Godel number / in such a manner 
that Si and s 8 _|_i are neighbours due to the Hamming met- 
ric of G. Fig. 6 shows the fitness values of the sequences 



numbered due to the Hamilton way in a representative 
range. It is easy to see that the valuation landscape has a 
very rugged structure, the degree of e-continuity is 1.26. 



V(s) 




4800 4820 4840 4860 4880 4900 

s 

FIG. 6. The valuation landscape on the genotype space, 
L = 7. 

The fitness landscape on the phenotype space F(ot, if) 
is shown in fig. 7. We can see the landscape over (a, ff) 
is very smooth. Not all combinations (a, it) are possible 
phenotypes, the optimum F(6, 0) = 6.0 is an isolated is- 
land on the landscape (right side fig. 7). The degree of 
e-continuity is 0.379. 




alpha 



FIG. 7. The fitness landscape on the phenotype space, 
L = 7. 

The figures clearly show the smoothing action of the 
genotype-phenotype map. 

The density of states has been defined by the num- 
ber of genotypes belong to a certain phenotype. Fig. 8 
shows the density over the phenotypes n(a, 7r) and the 
fitness levels n(F). On the one hand side, the density 
of states seems to be a very rugged function when we 
looking at n(F). On the other hand, the density over the 
phenotypes <j> = (a, it) is a very smooth landscape. This 
interesting feature underlines the importance of the right 
choice of the genotype-phenotype map: The phenotypes 
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obtained by (15) seem to be the natural representation 
of the problem. 




FIG. 8. The density of states n(F) and n(a, w), L = 7. 



VI. THE DENSITY OF STATES OF LONG 
SEQUENCES 



The number of possible sequences with L = 100 is ap- 
proximately 10 60 , thus the calculation of n(a, it) or n(F) 
is impossible. Fortunately, it is well known from statisti- 
cal physics and the theory of thermodynamic strategies 
(Andresen, 1989; Sibiani et. al. 1990; Berry et.al., 1993) 
that for a Boltzmann strategy (Metropolis algorithm) the 
equilibrium density of realizations of values observes the 
canonical distribution 



Pe t (V) ~n(V)-exp 




The density of states is given by 



n(V)~P eq (V)-^p - (16) 



To obtain the equilibrium density P eq (V) we simulated 
an ensemble of N = 10, 000 sequences of length L = 100 
which carry out a Boltzmann strategy with the muta- 
tion rate (8) and the potential (9). The density P(V) 
is approximated by the frequency N(V)/N , where N(V) 
is the number of individuals with V £ [V, V + AV). In 
the long time limit P(V) tends towards the equilibrium 
density P eq (V). After 10,000 time steps P(V) was re- 
laxated into the equilibrium. We tested the convergency 
behaviour up to 100, 000 time steps. 

We have made the simulation at two different tem- 
peratures to scan up the whole range of V. The re- 
sulting density of states is in very good approximation 
a Gaufidistribution as to be seen from fig. 9. 



H = 29.9 a = 4.93 



T = 5 




20 40 60 80 
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FIG. 9. The density of states n(F), L = 100. 

We emphasize that the method also works for other fit- 
ness landscapes and is not restricted to sequences. The 
simulation of ensemble of Boltzmann searchers is a gen- 
eral way to obtain the density of states and therefor a 
method to classify the fitness landscape of any optimiza- 
tion problem. 

In the special case of a fitness landscape like (9) we are 
able to calculate the structure of n(V) by means of the 
group Q. 

Let S be the space of schemes and let V : G — ► IR 
be the valuation function where G is the sequence space. 
Now we introduce a equivalence relation by 

si ~ s 2 <^> V{s{) = V(s 2 ) Vsi, s 2 £ G 

So the set of equivalence classes G j ~ is up to a little set 
of combinatorial operations equal to the space of scheme 
states S . We are interested on the question: How many 
sequences with the same fitness value exist? To this end 
we introduce a map n : IR — ► IR which gives for every 
fitness value the number of states occupying this value. 
We argue that for large sequence lengths L the number 
of combinatorial operations for every equivalence class is 
constant. 

Consider now a little shift of the valuation V — ► V+AV . 
This shift can be expressed by group action of Q which 
is defined in the following sense. The group relations 

gig' 1 = e i= 1,2,3,4 and g 1 g~ 1 g 3 = e (17) 

in Q are defined local that means for all places in the 
string. Now we define the group action by action of every 
generator on the sequence with valuation with respect 
to the relations (17). Because of correlations we obtain 
the following restrictions for two generators acting on two 
successive places in the string 

(ga^i+iigih (gs^i+iigdi 

where (g)i means the action of the generator g on the 
i-th place. The action on the i and i + 2-th places are 
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independent. The next step is the formal introduction 
of derivations over the sequence space via group actions. 
Since Conne's non-commutative geometry many physi- 
cists applicate such methods to the case of discrete sets 
(Miiller-Hoisen et.al., 1993). We define the derivative of 
V along in the i-th direction of the scheme space by 

^-(.s) = ^(V(g S ) - V(.s)) = \{L g - l)V(s) s e S 

(18) 



where L g is the links translation given by L g V(s) 
V(gs). From the ordinary rules we obtain 



dn s—^ dg dn 

dV~^ dV~dg 
gee y 



If we introduce the function 



8(x — xq) 



1 X = Xo 
else 



(19) 



(20) 



then it follows 

n(V(.s)) = J2 HV(gs) - V(.s)) = J2 H(L g - l)V(.s)) 
gee gee 

(21) 

For instance this expression can be approximated by 

HV(gs) - V(.s)) « ex P(" 5 • (W) " ^ s )) 2 ) 
gee gee 

(22) 



will calculate these expression by arguments relating to 
the structure of the group Q . The action of the genera- 
tors g\ and g^ leads to increasing of the valuation by 1 or 
b, respectively. The probability to make such mutation 
is twice as big as the probability of the mutation with g^. 
This follows simple from the relations (17). Because of 
the linear defined valuation we obtain 



dg 

dV 



(V)~V 



(28) 



If the valuation increase the density of generators will 
decrease because the number of states * decrease after 
every mutation. That means 



%L(V) = A(Vo - V) 



(29) 



with suitable constants Vq,A. Together with (25) and 
(27) the differential equation 



± = -A(V-V 0)n 



(30) 



is obtained. The solution of this equation (for large L) is 
simple 

n(V) = n(0) eM-MV 2 - V V)) (31) 
= n(0) exp(,4y 2 /4) exp(-A(V - V /2) 2 ) (32) 

So that by investigation of such methods the qualitative 
structure of the degeneration distribution is obtained. 
The proof of all formulas is mathematical not rigourous 
but this should simple done following (Miiller-Hoisen 
et.al., 1993). 



with B ~ L as suitable number. Let h : IR > IR be an 

arbitrary function without singularity in h(xo) then 



h(x)8(x — xq) = h(xo) 



(23) 



—h(x)8(x — xo) + h(x)—8(x — xo) = (24) 
ax ax 



Together with (24) we obtain 

h'(V) 
h(V) 



dn -J2 h ^Um as )-v( S )) 



dV 



(25) 



gee 



Comparing with (19) and together with the obvious re- 
lation 



^(g 1 ) = S(V(g lS )-V( S )) 



(26) 



it follows 



This formula can be interpreted as "density of generators 
acting on the scheme state with valuation V " . Next we 



VII. CONCLUSIONS 

We have shown that the investigation of evolution on 
discrete spaces lead to a number of principal problems, 
e.g.: 

(i) The topological properties of the genotype and phe- 
notype space. 

(ii) The structure of the fitness landscape. 

(iii) The smoothness of the fitness landscape and the 
representation by the genotype-phenotype map. 

(iv) The classification of fitness landscape and the 
derivation of optimal search strategies. 

The question of the structure of the fitness landscape 
and its classification is connected with the topological 
properties of the underlying spaces. We have shown that 
the choice of the Hamming metric on the genotype space 
leads to a rugged valuation landscape on which the evo- 
lutionary search is very difficult. On the other hand it 
is possible to smooth the landscape by a suitable repre- 
sentation of the problem. The genotype-phenotype map 
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transforms the genotype space and its metric to the phe- 
notype space and a new metric. On this space the prin- 
ciple of strong causality is valid: the change of fitness be- 
tween two neighbouring phenotypes, with respect to the 
new metric, is very small. The genotype-phenotype map 
increased the degree of e-continuity of the landscape. The 
introduction of the new metric on the phenotype space 
may also described by new mutation operators on the 
genotype space: The suitable mutation operators trans- 
form a genotype in such a manner that the correspond- 
ing phenotypes are neighbours with respect to the new 
metric on the phenotype space. The determination of 
the genotype-phenotype map or the construction of new 
mutation operators are two interpretations of the same 
problem: the problem of a smooth representation of the 
fitness landscape. 

The density of states is an measure of the difficulty 
of an optimization problem. This measure is invariant 
to the genotype-phenotype map or any choice of repre- 
sentation of the fitness landscape and characterized only 
the problem. That makes it feasible to use the density of 
states as classifying measure of fitness landscapes. 
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APPENDIX A 

Lemma 1. 

Proof 1 At first we want to fix the topology in the se- 
quence space given by the neighbourhood structure Ag ■ 
The neighbourhood of a special sequence s £ G will be 
denoted by Ng(s). Next we introduce the equivalence re- 
lation 

si ~ s 2 <^> V(si) = V(s 2 ) Vsi, s 2 £ G 

and form the quotient space Gj ~. G carry a nat- 
ural semigroup structure given by concatenation of let- 
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ters. This structure can be extended to Gj ~. In gen- 
eral the building of equivalence classes leads to a change 
in the topology. Consider two sequences s, s' £ G with 
Ag(s',s) = and V(s) = V(s'). Both sequences are rep- 
resentants of the same equivalence class. In the neigh- 
bourhood of each sequence are by definition sequences 
with differ only by one letter. It is obvious that this 
fact change the metric and the topology. let A~ be the 
neighbourhood structure in G / ~ induced from Ag in G. 
Consider three sequences s,s',s" £ G with s ~ s' and 
Ag(s,s") = 1 (s" is in the neighbourhood of s) so we 
obtain simple 

A~(s,s') = A~(s,s")=l A~(s',s") = l 

So it is obvious that the valuation function V over G / ~ 
induced from the valuation of G has the same e number, 
that means 

max (\V(s) -V(s')\) = max (\V(s) - V(s')\) 
s ,s'eG s ,s'eG/~ 

A~(s,s') = l A~(s,s') = l 

(Al) 

But the function V differs from V by the fact that V 
is bijective. The ambiguity of the valuation function V 
is encoded in the topological structure of G j ~. To proof 
this assertion we consider point mutation of the sequence. 
The set of point mutation forms a group denoted by Q 
which together with a group action a : Q xG > G deter- 
mines the point mutations completely. This group is gen- 
erated by a finite number of elements g\, . . . ,g v ( genera- 
tors) which are equal to the elementary mutations. Be- 
cause of the existence of different sequences with the same 
valuation a finite number k of mutations represented by 
a sequence of generators gi 1 gi 2 ■ ■ ■ g% k exists with the fol- 
lowing property 

s ~ aigi^i^ ■ -g ik ,s) s£G 

So that the ambiguity comes from loops generated by the 
building of equivalence classes and represented by the mu- 
tation group Q. To eliminate this ambiguity we have to 



change the topology of G j ~ without changing the bijec- 
tive map V . This can be done by the following rules: 

(i) Cut the line in the loop which has the represent the 
largest change in the valuation. 

(n) Connect the disjoint parts of the space generating 
by the rule (i) in such way that the change in the 
valuation will be minimized. 

The rules generate a connected space denoted by G s / ~. 
The valuation function V does not change by this proce- 
dure and is denoted by V s . Because of this fact we obtain 
from rules above that 



max (\V(s) - V(s')\) > 
s ,s'eG 

h(s,s') = l 



max (\V s (s)-V s (s')\) 

fs(s,s') = l 

(A2) 



with f s as metric in G s / ~. The very nice fact is ob- 
tained that the valuation function V s is more smooth as 
V . If we denote the map from G to G s / ~ by p s then the 
commuting diagram follows 

G.l ~ 

G M+ 

$\ / F 

Q 

i.e. V s op s = V = Fo<&. Because of the bijective functions 
V s and F the genotype-phenotype map $ = i* 1-1 o V s op s 
is completely characterized by p s and has the properties 
according to the theorem 1. Thus the genotype-phenotype 
map is uniquely given by the construction defined above. 
q.e.d. 
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